34 PART 1 Getting Started with Biostatistics

But the idea of doing a census to calculate such a parameter is not practical. Even

if we somehow had a list of everyone in the city we could contact, it would be not

be feasible to visit all of them and measure their SBP. Nor would it be necessary.

Using inferential statistics, we could draw a sample from this population, measure

their SBPs, and calculate the mean as a sample statistic. Using this approach, we

could estimate the mean SBP of the population.

But drawing a sample that is representative of the background population depends

on probability (as well as other factors). In the following sections, we explain why

samples are valid but imperfect reflections of the population from which they’re

drawn. We also describe the basics of probability distributions. For a more exten-

sive discussion of sampling, see Chapter 6.

Recognizing that sampling isn’t perfect

As used in epidemiologic research, the terms population and sample can be defined

this way:»

» Population: All individuals in a defined target population. For example, this

may be all individuals in the United States living with a diagnosis of Type II

diabetes.»

» Sample: A subset of the target population actually selected to participate in a

study. For example, this could be patients in the United States living with

Type II diabetes who visit a particular clinic and meet other qualification

criteria for the study.

Any sample, no matter how carefully it is selected, is only an imperfect reflection

of the population. This is due to the unavoidable occurrence of random sampling

fluctuations called sampling error.

To illustrate sampling error, we obtained a data set containing the number of pri-

vate and public airports in each of the United States and the District of Columbia

in 2011 from Statista (available at https://www.statista.com/statistics/

185902/us-civil-and-joint-use-airports-2008/). We started by making a

histogram of the entire data set, which would be considered a census because it

contains the entire population of states. A histogram is a visualization to deter-

mine the distribution of numerical data, and is described more extensively in

Chapter 9. Here, we briefly summarize how to read a histogram:»

» A histogram looks like a bar chart. It is specifically crafted to display a

distribution.